AITopics | cloud environment

Collaborating Authors

cloud environment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Federated Learning Framework for Scalable AI in Heterogeneous HPC and Cloud Environments

Ghimire, Sangam, Timalsina, Paribartan, Bhurtel, Nirjal, Neupane, Bishal, Shrestha, Bigyan Byanju, Bhattarai, Subarna, Gaire, Prajwal, Thapa, Jessica, Jha, Sudan

arXiv.org Artificial IntelligenceNov-26-2025

As AI models continue to grow in complexity and size, so does the demand for vast computational resources and access to large-scale distributed datasets. At the same time, growing concerns about data privacy, ownership, and regulatory compliance make it increasingly difficult to centralize data for training. FL has emerged as a promising paradigm for addressing these challenges, enabling the training of collaborative models across multiple data silos without requiring the raw data to leave its source. While FL has gained traction in mobile and edge environments, such as smart-phones and IoT devices, its application in large-scale computing platforms like HPC clusters and cloud infrastructure remains underexplored. Meanwhile, the convergence of HPC and cloud computing is reshaping the landscape of modern data-intensive applications. These hybrid environments combine the raw power and efficiency of HPC with the scalability and flexibility of the cloud, making them well-suited for training large AI models. However, this integration brings new challenges: heterogeneous hardware (e.g., Central Processing Units (CPUs), Graphics Processing Units (GPUs), Tensor Processing Units (TPUs)), inconsistent network performance, dynamic resource availability, and non-uniform data distributions across clients. In this context, the deployment of federated learning across such mixed infrastructure is both a timely opportunity and a technical challenge. This paper explores how FL can be adapted and optimized to run efficiently across heterogeneous HPC and cloud environments, with a focus on scalability, system resilience, and performance under non-IID data conditions.

artificial intelligence, cloud computing, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.19479

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Hybrid Proactive And Predictive Framework For Edge Cloud Resource Management

Kumar, Hrikshesh, Garg, Anika, Gupta, Anshul, Agarwal, Yashika

arXiv.org Artificial IntelligenceNov-21-2025

Old cloud edge workload resource management is too reactive. The problem with relying on static thresholds is that we are either overspending for more resources than needed or have reduced performance because of their lack. This is why we work on proactive solutions. A framework developed for it stops reacting to the problems but starts expecting them. We design a hybrid architecture, combining two powerful tools: the CNN LSTM model for time series forecasting and an orchestrator based on multi agent Deep Reinforcement Learning In fact the novelty is in how we combine them as we embed the predictive forecast from the CNN LSTM directly into the DRL agent state space. That is what makes the AI manager smarter it sees the future, which allows it to make better decisions about a long term plan for where to run tasks That means finding that sweet spot between how much money is saved while keeping the system healthy and apps fast for users That is we have given it eyes in order to see down the road so that it does not have to lurch from one problem to another it finds a smooth path forward Our tests show our system easily beats the old methods It is great at solving tough problems like making complex decisions and juggling multiple goals at once like being cheap fast and reliable

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.16075

Country: Asia > India (0.14)

Genre: Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (0.94)
Transportation (0.68)
Telecommunications (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine learning-based cloud resource allocation algorithms: a comprehensive comparative review

Bodra, Deep, Khairnar, Sushil

arXiv.org Artificial IntelligenceNov-18-2025

Cloud resource allocation has emerged as a major challenge in modern computing environments, with organizations struggling to manage complex, dynamic workloads while optimizing performance and cost efficiency. Traditional heuristic approaches prove inadequate for handling the multi-objective optimization demands of existing cloud infrastructures. This paper presents a comparative analysis of state-of-the-art artificial intelligence and machine learning algorithms for resource allocation. We systematically evaluate 10 algorithms across four categories: Deep Reinforcement Learning approaches, Neural Network architectures, Traditional Machine Learning enhanced methods, and Multi-Agent systems. Analysis of published results demonstrates significant performance improvements across multiple metrics including makespan reduction, cost optimization, and energy efficiency gains compared to traditional methods. The findings reveal that hybrid architectures combining multiple artificial intelligence and machine learning techniques consistently outperform single-method approaches, with edge computing environments showing the highest deployment readiness. Our analysis provides critical insights for both academic researchers and industry practitioners seeking to implement next-generation cloud resource allocation strategies in increasingly complex and dynamic computing environments.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.3389/fcomp.2025.1678976

2511.11603

Country: North America > United States (1.00)

Genre:

Overview (1.00)
Research Report (0.84)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Confidential Computing for Cloud Security: Exploring Hardware based Encryption Using Trusted Execution Environments

Agarwal, Dhruv Deepak, Cherukuri, Aswani Kumar

arXiv.org Artificial IntelligenceNov-7-2025

The growth of cloud computing has revolutionized data processing and storage capacities to another levels of scalability and flexibility. But in the process, it has created a huge challenge of security, especially in terms of safeguarding sensitive data. Classical security practices, including encryption at rest and during transit, fail to protect data in use and expose it to various possible breaches. In response to this problem , Confidential Computing has been a tool ,seeking to secure data in processing by usage of hardware-based Trusted Execution Environments (TEEs). TEEs, including Intel's Software Guard Extensions (SGX) and ARM's TrustZone, offers protected contexts within the processor, where data is kept confidential ,intact and secure , even with malicious software or compromised operating systems. In this research, we have explored the architecture and security features of TEEs like Intel SGX and ARM TrustZone, and their effectiveness in improving cloud data security. From a thorough literature survey ,we have analyzed the deployment strategies, performance indicators, and practical uses of these TEEs for the same purpose. In addition, we have discussed the issues regarding deployment, possible weaknesses, scalability issues, and integration issues. Our results focuses on the central position of TEEs in strengthening and advancing cloud security infrastructures, pointing towards their ability to create a secure foundation for Confidential Computing.

cloud computing, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.0455

Country: Europe (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.69)

Add feedback

Artificial Intelligence for Cost-Aware Resource Prediction in Big Data Pipelines

Goyal, Harshit

arXiv.org Artificial IntelligenceOct-8-2025

Efficient resource allocation is a key challenge in modern cloud computing. Over-provisioning leads to unnecessary costs, while under-provisioning risks performance degradation and SLA violations. This work presents an artificial intelligence approach to predict resource utilization in big data pipelines using Random Forest regression. We preprocess the Google Borg cluster traces to clean, transform, and extract relevant features (CPU, memory, usage distributions). The model achieves high predictive accuracy (R Square = 0.99, MAE = 0.0048, RMSE = 0.137), capturing non-linear relationships between workload characteristics and resource utilization. Error analysis reveals impressive performance on small-to-medium jobs, with higher variance in rare large-scale jobs. These results demonstrate the potential of AI-driven prediction for cost-aware autoscaling in cloud environments, reducing unnecessary provisioning while safeguarding service quality.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.05127

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Data Science > Data Mining > Big Data (0.62)

Add feedback

Towards Generalizable Context-aware Anomaly Detection: A Large-scale Benchmark in Cloud Environments

Zou, Xinkai, Jiang, Xuan, Huang, Ruikai, He, Haoze, Kapoor, Parv, Wu, Hongrui, Wang, Yibo, Sha, Jian, Shi, Xiongbo, Huang, Zixun, Zhao, Jinhua

arXiv.org Artificial IntelligenceOct-7-2025

Anomaly detection in cloud environments remains both critical and challenging. Existing context-level benchmarks typically focus on either metrics or logs and often lack reliable annotation, while most detection methods emphasize point anomalies within a single modality, overlooking contextual signals and limiting real-world applicability. Constructing a benchmark for context anomalies that combines metrics and logs is inherently difficult: reproducing anomalous scenarios on real servers is often infeasible or potentially harmful, while generating synthetic data introduces the additional challenge of maintaining cross-modal consistency. Ensuring the stability and availability of large-scale cloud systems is of great importance (Kazemzadeh & Jacobsen, 2009; Bu et al., 2018; Zhang et al., 2015). Accurate detection methods that can also identify among anomaly scenarios are essential to mitigate potential losses (Zhang et al., 2018; Barbhuiya et al., 2018a). Large-scale cloud systems usually generate abundant logs and expose various metrics, both of which serve as some of the most valuable data sources for anomaly detection (Lin et al., 2016; Nandi et al., 2016). Numerous benchmarks have been proposed for cloud anomaly detection such as (Oliner & Stearley, 2007; Xu et al., 2009; Akmeemana et al., 2025). However, most existing research and benchmarks for cloud anomaly detection have focused on point anomalies, where deviations are identified in isolation within a single modality, such as metrics or logs. Although these benchmarks have provided the community with relevant evaluation testbeds, they capture only a narrow slice of the anomaly landscape and often fail to reflect the complexity of real cloud environments.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.01844

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Services (0.93)
Energy (0.68)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CloudFormer: An Attention-based Performance Prediction for Public Clouds with Unknown Workload

Shahbazinia, Amirhossein, Huang, Darong, Costero, Luis, Atienza, David

arXiv.org Artificial IntelligenceSep-4-2025

Cloud platforms are increasingly relied upon to host diverse, resource-intensive workloads due to their scalability, flexibility, and cost-efficiency. In multi-tenant cloud environments, virtual machines are consolidated on shared physical servers to improve resource utilization. While virtualization guarantees resource partitioning for CPU, memory, and storage, it cannot ensure performance isolation. Competition for shared resources such as last-level cache, memory bandwidth, and network interfaces often leads to severe performance degradation. Existing management techniques, including VM scheduling and resource provisioning, require accurate performance prediction to mitigate interference. However, this remains challenging in public clouds due to the black-box nature of VMs and the highly dynamic nature of workloads. To address these limitations, we propose CloudFormer, a dual-branch Transformer-based model designed to predict VM performance degradation in black-box environments. CloudFormer jointly models temporal dynamics and system-level interactions, leveraging 206 system metrics at one-second resolution across both static and dynamic scenarios. This design enables the model to capture transient interference effects and adapt to varying workload conditions without scenario-specific tuning. Complementing the methodology, we provide a fine-grained dataset that significantly expands the temporal resolution and metric diversity compared to existing benchmarks. Experimental results demonstrate that CloudFormer consistently outperforms state-of-the-art baselines across multiple evaluation metrics, achieving robust generalization across diverse and previously unseen workloads. Notably, CloudFormer attains a mean absolute error (MAE) of just 7.8%, representing a substantial improvement in predictive accuracy and outperforming existing methods at least by 28%.

cloud computing, data mining, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2509.03394

Country:

Europe (0.68)
North America > United States (0.29)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SLA-Centric Automated Algorithm Selection Framework for Cloud Environments

Rizwan, Siana, Ahmed, Tasnim, Choudhury, Salimur

arXiv.org Artificial IntelligenceJul-30-2025

Cloud computing offers on-demand resource access, regulated by Service-Level Agreements (SLAs) between consumers and Cloud Service Providers (CSPs). SLA violations can impact efficiency and CSP profitability. In this work, we propose an SLA-aware automated algorithm-selection framework for combinatorial optimization problems in resource-constrained cloud environments. The framework uses an ensemble of machine learning models to predict performance and rank algorithm-hardware pairs based on SLA constraints. We also apply our framework to the 0-1 knapsack problem. We curate a dataset comprising instance specific features along with memory usage, runtime, and optimality gap for 6 algorithms. As an empirical benchmark, we evaluate the framework on both classification and regression tasks. Our ablation study explores the impact of hyperparameters, learning approaches, and large language models effectiveness in regression, and SHAP-based interpretability.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.21963

Country: North America > Canada > Ontario (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Cross-Cloud Data Privacy Protection: Optimizing Collaborative Mechanisms of AI Systems by Integrating Federated Learning and LLMs

Luo, Huaiying, Ji, Cheng

arXiv.org Artificial IntelligenceMay-20-2025

In the age of cloud computing, data privacy protection has become a major challenge, especially when sharing sensitive data across cloud environments. However, how to optimize collaboration across cloud environments remains an unresolved problem. In this paper, we combine federated learning with large-scale language models to optimize the collaborative mechanism of AI systems. Based on the existing federated learning framework, we introduce a cross-cloud architecture in which federated learning works by aggregating model updates from decentralized nodes without exposing the original data. At the same time, combined with large-scale language models, its powerful context and semantic understanding capabilities are used to improve model training efficiency and decision-making ability. We've further innovated by introducing a secure communication layer to ensure the privacy and integrity of model updates and training data. The model enables continuous model adaptation and fine-tuning across different cloud environments while protecting sensitive data. Experimental results show that the proposed method is significantly better than the traditional federated learning model in terms of accuracy, convergence speed and data privacy protection.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.13292

Country: North America > United States > Illinois (0.29)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning

Saqib, Muhammad, Mehta, Dipkumar, Yashu, Fnu, Malhotra, Shubham

arXiv.org Artificial IntelligenceMay-15-2025

The securit y of cloud environments, such as Amazon Web Services (AWS), is complex and dynamic. St atic security policies have be come inadequate as threats evolve and cloud resources exhibit elasticity [1]. This paper addresses the limitations of static policies by proposing a security policy management framework that uses reinforcement learning (RL) to adapt dynamically. Specifically, we employ deep reinforcement learni ng algorithms, including deep Q Networks and proximal polic y op timization, enabling the learning and continuous adjustment of controls such as firewall rules and Identity an d Access Management (IAM) poli cies. The proposed RL based solution leverages cloud telemetry data (AWS Cloud Trail logs, network traffic data, threat intelligence feeds) to continuously refine security policies, maximizing threat mitigation, and compliance while minimizing resource impact. Experimental results d emonstrate that our adaptive RL bas ed framework significantly out performs static policies, achieving higher intrusion detection rates (92 % compared to 82% for static policies) and substantially reducing incident detection and response times by 58%. In a ddition, it maintains high con formity with security requirements and efficient resource usage. I. INTRODUCTION Cloud security is a critical concern as more orga nizations rely on cloud infras tructure. AWS an d other cloud platforms provide security configurations such as firewall rules and IAM policies, which are typically managed through static policies set by administrators. However, static policies cannot adapt to the dynamic nature of cloud environments, where workloads, users, and attack patterns change rapidly [1]. This rigidity exposes cloud deployments to new threats or misconfigurations that are not covered by static rules. For instance, static firewall rules may fail to detect novel attack patterns, and fixed IAM roles may become over privileged as resources scale, increasing risk . Problem Statement: Traditional cloud security policy management cannot keep pace with evolving threats and agile DevOps practices. M anual policy updates are error prone and slow.

data mining, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2505.08837

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
Europe > Latvia > Riga Municipality > Riga (0.04)
Asia > Middle East > Bahrain > Capital Governorate > Manama (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback